Disambiguating Cue Phrases in Text and Speech

نویسندگان

  • Diane J. Litman
  • Julia Hirschberg
چکیده

Cue phrases are linguistic expressions such as 'now' and 'welg tha t may explicitly mark the structure of a discourse. For example, while the cue phrase ' inczdcntally' may be used SENTENTIALLY as an adverbial, the DISCOUaSE use initiates a digression. In [8], we noted the ambiguity of cue phrases with respect to discourse and sentential usage and proposed an intonational model for their disambiguation. In this paper, we extend our previous characterization of cue phrases aald generalize its domain of coverage, based on a larger and more comprehensive empirical study: an examination of all cue phrases produced by a single ,~peaker in recorded natural speech. We also associate this prosodic model with orthographic and part-of-speech analyses of cue phrases in text. Such a dual model provides both theoretical justification for current computat ional models of discourse and practical application to the generation of synthetic speech. 1 I n t r o d u c t i o n Words and phrases that may directly mark the s tructure of a discourse have been termed CUE PttR.ASES, CLUE W O R D S , DISCOURSE MAI:tKERS~ ar id DISCOURSE PARTICLES [3, 4, 14, 17, 19]. Some exarnpies are 'now', which marks the introduction of a new subtopic or return to a previous one, 'incidentally' and 'by the way', which indicate the beginning of a digression, and 'anyway' and ' in any case', which indicate return from a digression. In a previous study[8], we noted that such terms are potentially ambiguous between DISCOURSE and SENTENTIAL uses[18]. So, 'now' may be used as a temporal adverbial as well as a discourse marker, 'incidentally' may also function as an adverbial, and other cue phrases similarly have one or more senses in addition to their function as markers of discourse structure. Based upon an empiricM study of 'now' in recorded speech, we proposed that such discourse and sentential uses of cue phrases can be disambiguated intonationally. In particular, we proposed a prosodic model for this disambiguation which discriminated all discourse from *We t h a n k Bengt Al tenberg , l=tichaa-d O m a n s o n mid Jan van San t en for provid ing in format ion a n d helpful c o m m e n t s on this work. sentential uses of tokens in our sample. This model provided not only a plausibility argument for the disambiguation of cue phrases, but also the beginnings of a model for the generation of cue phrases in synthetic speech. In this paper, we show that our prosodic model generalizes to other cue phrases as well. We further propose an initial model for the disambiguation of cuc phrases in text. Wc base these claims upon a further empirical study: an examination of all cue phrases produced by a single speaker in part of a recorded, transcribed lecture. In Section 2 we review our own and other work on cue phrases, in Section 3 we describe our current empirical studies, in Section 4 we present the results of our analysis, and in Section 5 we discuss theoretical and practical applications of our findings. 2 P r e v i o u s S t u d i e s The impor tant role that cue phrases play in understanding and generating discourse has been well documented in the computat ional linguistics literature. For example, by indicating the presence of a structural boundary or a relationship between parts of a discourse, cue phrases caa assist in the resolution of anaphora[5, 4, 17] and in the identification of rhetorical relations [10, 12, 17]. Cue phrases have also been used to reduce the complexity of discourse processing and to increase textual coherence[3, 11, 21]. In Example (1) 1, interpretation of the anaphor ' i t ' as (correctly) co-indexed with THE SYSTEM is facili tated by the presence of the cue phrases 'say' and 'then', marking potential antecedents in '... as an E X P E R T DATABASE for AN E X P E R T SYSTEM . . . ' a s structurally unavailable. 2 (1) "If THE SYSTEM attenqpts to hold rules, say as AN E X P E R T DATABASE fo r AN E X P E R T SYSTEM, then we expect it not only to hold the rules but to in fact apply them for us in appropriate situations." 1The examples are t aken f rom the corpus descr ibed in Sect ion 3. 2InformMly, 'say' indicates the beg inn ing of a discourse subtopic and 'then' signals a r e t u rn from tha t subtopic .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Empirical Studies on the Disambiguation of Cue Phrases

Cue phrases are linguistic expressions such as now and well that function as explicit indicators of the structure of a discourse. For example, now may signal the beginning of a subtopic or a return to a previous topic, while well may mark subsequent material as a response to prior material, or as an explanatory comment. However, while cue phrases may convey discourse structure, each also has on...

متن کامل

Classifying Cue Phrases in Text and Speech Using Machine Learning

Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sen-tential sense to convey semantic rather than structural information. This paper explores the use of machine learning for classifying cue phrases as discourse or sen-tential. Two machine learning programs (cgrendel and C4.5) are used to induce classiication rules from sets of pre-classiied cu...

متن کامل

Spoken-style explanation generator for Japanese kanji using a text-to-speech system

In this paper we describe a spoken explanation generator, PLANET, for Japanese Kanji (ideograms), especially Kanji used in people's names. A number of text-to-speech systems for Kanji texts have been proposed but this is the rst one that can explain Kanji characters so as to disambiguate characters from many homophone Kanji candidates. To accomplish this the generator explains the Kanji by usin...

متن کامل

تعیین مرز و نوع عبارات نحوی در متون فارسی

Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...

متن کامل

Cue Phrase Classi cation Using Machine Learning

Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990